-
Notifications
You must be signed in to change notification settings - Fork 439
[Syntax] Use BumpPtrAllocator for Syntax node internals #2925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f270db9
to
f390137
Compare
public subscript<RangeType: RangeExpression<Int>>( | ||
range: RangeType | ||
) -> SyntaxArenaAllocatedBufferPointer<Element> { | ||
return SyntaxArenaAllocatedBufferPointer(UnsafeBufferPointer(rebasing: self.buffer[range])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self
as SubSequence
was an anti-pattern. That violated "The subsequence shares indices with the original collection".
f390137
to
f1bc852
Compare
f9b6c7e
to
cf0292d
Compare
|
||
// If the buffer is already populated, return it. | ||
if let baseAddress = baseAddressRef.pointee { | ||
return SyntaxDataReferenceBuffer(UnsafeBufferPointer(start: baseAddress, count: childCount)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should not read the memory before the lock, or we should use atomic read/write. But does it really matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think you either need to guard this with the lock, or use atomics. Another thread could potentially perform the writes out-of-order, so the buffer may not be fully initialized. SE-0282 requires concurrent read/write accesses to be done using atomic operations (a lock also works as it would make them non-concurrent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reference! Changed to use _Atomic(const void *)
, and it doesn't seem to affect the performance numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Could be worth seeing if this can be done purely using atomics to avoid needing PlatformMutex, e.g compare-and-exchange a null pointer with 0x1
on the thread that computes the layout, with racing threads spinning until it becomes > 0x1
.
Edit: Having thought about this a bit more, I'm not sure it's actually a good idea: we'd still need a mutex to guard the allocator, and doing a spinlock can potentially lead to deadlocks
10d9545
to
fd35cc9
Compare
2132a4f
to
2829606
Compare
74beef9
to
6a5f0fe
Compare
Rebased on main after #2924 |
6a5f0fe
to
c53efa4
Compare
@swift-ci Please test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. 🚀 Looks great to me overall, I left a few comments inline.
/// Each node consumes `SyntaxData` size at least. Non-empty layout node tail | ||
/// allocates a pointer storage for the base address of the layout buffer. | ||
/// | ||
/// For layout buffers, each child element consumes a `SyntaxDataReference` in | ||
/// the parent's layout. But non-collection layout nodes, the layout is usually | ||
/// sparse, so we can't calculate the exact memory size until we see the RawSyntax. | ||
/// That being said, `SytnaxData` + 4 pointer size looks like an enough estimation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this a bit and think that we should be a little more conservative with the memory that we allocate. My rationale is as follows (if you agree, I make this a doc comment).
We can’t know the size required to hold all
SyntaxData
for the entire tree until we have traversed it becausetotalNodes
only counts the non-nil syntax nodes in the tree (which require the size ofSyntaxData
plus the size ofAtomicPointer
by which theSyntaxData
is referenced from the parent node). Additionally, all layout children that arenil
require the size ofAtomicPointer
but thesenil
-slots are not accounted for intotalNodes
, so we need to estimate the ratio ofnil
children to non-nil
children. The facts that we can base our estimate on are:
- In a valid syntax tree, every layout node with n children has
n + 1
unexpected*
children, which arenil
for valid syntax trees, which account for the majority of syntax trees. We can thus expect at least 1nil
-child for every non-nil child.- Some layout nodes have optional children, which further increases the number of
nil
-children.- Collection nodes don’t have
nil
children, which offsets the optional children to some degree.Based on that information, we can choose to over- or underestimate:
- Under-estimating the slab size by a constant-ish factor of 2 or 3 is not a big deal because it means that we just need to allocate another 1 or 2 slabs if the user does a full-tree traversal. If the user doesn’t traverse the entire tree, we save a little memory.
- Over-estimating the slab size if the full tree size is less than 4096 means that we allocate memory which we’ll never use.
We thus assume that every non-
nil
child has roughly one child that isnil
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to keep single node trees minimal. Note that single node trees are often created when rewriting TokenSyntax
in SyntaxRewriter
. MemoryLayout<SyntaxData>.stride
(32 bytes) is enough for them. I don't want to waste extra pointers size. So I think we should handle root node specially.
As for the actual estimation, actually measured with our swift-syntax test suite:
- with my estimation logic,
1,043
/998,482
(0.1%) trees created withslabSize < 4096
overflows the estimated slab size, - with your estimation logic,
54,601
/1,047,632
(5.2%) trees created withslabSize < 4096
overflowed - fwiw,
(dataSize + pointerSize * 3) * nodeCount
still overflows in28,417
/1,020,774
trees.
And I think under-estimating the slab size is more problematic than over-estimating. because it wastes most of the second slab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t think about the single-node trees. That’s a good point and a worthwhile optimization. Maybe worth a comment.
Your analysis is correct under the assumption that the user visits the entire tree, which might not necessarily be true. And it might be worth measuring how much memory in the slabs are not used with the different estimation logics.
Why is the denominator of your measurements different for the different estimation logics? I would expect them to be the same because you should have created the same number of trees by running the test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your analysis is correct under the assumption that the user visits the entire tree, which might not necessarily be true.
FWIW my measurements are based on the actual test cases we have. They might be different from the typical use-cases, but I just wanted to point out that.
It's true that users don't always visit the entire tree, but I don't think some overallocation is that problematic. (of course we want to avoid allocating memory that we know will never be used)
And it might be worth measuring how much memory in the slabs are not used with the different estimation logics.
That would be interesting 👍
Why is the denominator of your measurements different for the different estimation logics? I would expect them to be the same because you should have created the same number of trees by running the test suite.
The denominator is the number of "trees created with slabSize < 4096
" which is different between the estimation logics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will revisit this.
03ed9c5
to
8949f20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
return nil | ||
} | ||
|
||
return SyntaxIdentifier(rootId: Self.rootId(of: root.raw), indexInTree: indexInTree) | ||
return SyntaxIdentifier(rootId: UInt(rawID: root.raw.id), indexInTree: indexInTree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but out of curiosity, is there any reason rootId
isn't RawSyntax.ID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we thought it should be serializable. But at this point, I don't think there's a reason. Actually I was thinking to make it just RawSyntax.ID
in a follow up :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you mention serializing, that might have been the reason when we still transferred the syntax tree via XPC or JSON from the C++ parser in 2018-ish.
8949f20
to
7a36dbb
Compare
@swift-ci Please test |
Rework `Syntax` internals. "Red" tree is now bump-pointer allocated and visited children are cached. * `Syntax` is a pair of the allocator (strong reference to `SyntaxDataArena` class) and a pointer to the allocated data (`SyntaxData` struct). * All the red-tree data (parent and positions) are bump-pointer-allocated and cached. * The arena and the tree are 1:1. Each mutation creates a new arena. * `child(at:)` is now O(1) because the red-tree children are cached in the arena. * `root` is now O(1) regardless of the depth because the arena holds the reference. * Simplify `SyntaxChildren`. `SyntaxChildrenIndex` is now just a `Int` Remove some hacks as they aren't needed anymore: * `UnownedSyntax` because `SyntaxDataReference` replaces it. * `SyntaxNodeFactory` because `Syntax.Info` is gone. Additionally: * Flatten `AbsoluteSyntaxInfo` to simplify the data and clarify what `advancedBySibling(_:)` and `advancedToFirstChild()` do. * Remove `RawSyntaxChildren` and `RawNonNilSyntaxChildren` as they were implementation detail of `SyntaxChildren`. * Remove `AbsoluteRawSyntax` and `AbsoluteSyntaxPosition` as nobody was really using it. * Rename `Syntax.indexInParent` to `layoutIndexInParent` because it was confusing with `SyntaxProtocol.indexInParent: SyntaxChildrenIndex`
7a36dbb
to
8fff0de
Compare
@swift-ci Please test |
@swift-ci Please test Windows |
Rework
Syntax
internals. "Red" tree is now bump-pointer allocated and visited children are cached.Syntax
is a pair of the allocator (strong reference toSyntaxDataArena
class) and a pointer to the allocated data (SyntaxData
struct).child(at:)
is now O(1) because the red-tree children are cached in the arena.root
is now O(1) regardless of the depth because the arena holds the reference.SyntaxChildren
.SyntaxChildrenIndex
is now just aInt
Remove some hacks as they aren't needed anymore:
UnownedSyntax
becauseSyntaxDataReference
replaces it.SyntaxNodeFactory
becauseSyntax.Info
is gone.Additionally:
AbsoluteSyntaxInfo
to simplify the data and clarify whatadvancedBySibling(_:)
andadvancedToFirstChild()
do.RawSyntaxChildren
andRawNonNilSyntaxChildren
as they were implementation detail ofSyntaxChildren
.AbsoluteRawSyntax
andAbsoluteSyntaxPosition
as nobody was really using it.Syntax.indexInParent
tolayoutIndexInParent
because it was confusing withSyntaxProtocol.indexInParent: SyntaxChildrenIndex
Baseline (main)
Recreate
SyntaxDataArena
for each iteration.Reuse fully populated
SyntaxDataArena
for all iterations.Reuse fully populated
SyntaxDataArena
. Combined with #2924